GetSafe-Logo-01-aspect-ratio-830-827-768x765.png

1 | Business Objective

In this notebook, I analysed GetSafe insurtech company's data. The data consists of 6,000 policyholders with 11 attributes that describe their region, sales channel, platform, and insured coverage information, including their age, and product type. The variable we would like to predict is the customer conversion, ages, that were traced by GetSafe.


The first aim of this analysis is to find interesting insights to grow with our customers. For this purpose, I applied exploratory data analysis after featuring and cleaning the data. The second aim is to identify the customer classification that leads to a higher conversion rate. Getsafe marketing team had already built certain prooxy matrics for acquiring young age customers. To test the modification to an app feature for the mobile app and website app, I utlised the A/B test to measure the target of the conversion rate for the more likely customer to be converted.

2 | Data Overview

Importing Libraries

In [12]:

Importing and reading Data

In [13]:
Out[13]:
age joined_getsafe_at language channel first_product zip_bucket area_classification most_used_os buying_platform converted variant
1172 29 2020-09-12 13:44:00 en aggregator contents 102xx Big 7 iOS app 1 A
1628 39 2020-06-06 0:09:00 en direct liability 227xx Big 7 Android app 0 B
549 36 2020-10-19 19:30:00 de direct liability 217xx rural Android app 1 B
5968 57 2020-09-01 17:34:00 de aggregator dog 835xx town NaN NaN 0 A
878 25 2020-04-26 14:15:00 de aggregator legal 453xx urban iOS app 1 A
In [14]:
There are 6,000 observations and 11 columns in the data set.
In [15]:
Out[15]:
Index(['age', 'joined_getsafe_at', 'language', 'channel', 'first_product',
       'zip_bucket', 'area_classification', 'most_used_os', 'buying_platform',
       'converted', 'variant'],
      dtype='object')
In [16]:
Out[16]:
age                     int64
joined_getsafe_at      object
language               object
channel                object
first_product          object
zip_bucket             object
area_classification    object
most_used_os           object
buying_platform        object
converted               int64
variant                object
dtype: object
In [17]:
Out[17]:
age converted
count 6000.000000 6000.000000
mean 32.721833 0.474500
std 27.635606 0.499391
min 18.000000 0.000000
25% 25.000000 0.000000
50% 30.000000 0.000000
75% 37.000000 1.000000
max 2013.000000 1.000000

The three columns ; which are "area_classification", "most_used_os" and "buying platform", have a missing entries. Further more the maximum age is 2013 which is wrong.

In [18]:
Out[18]:
<AxesSubplot:>
In [19]:
The number of missing entries before cleaning: 5.27 %
In [20]:

The output shows that one outlier in age is higher tahn the maximum age for the policy holder;moreover,NaN's corrections are needed, so the next section is dedicted for cleaning the data.

3 | Data Cleaning

  • Deleting the maximum age 2013 which is wrong.
In [21]:
Out[21]:
count    5999.000000
mean       32.391732
std        10.485397
min        18.000000
25%        25.000000
50%        30.000000
75%        37.000000
max        91.000000
Name: age, dtype: float64
  • Removing all the rows that contain a missing value
In [22]:
Out[22]:
age                       0
joined_getsafe_at         0
language                  0
channel                   0
first_product             0
zip_bucket                0
area_classification      19
most_used_os            586
buying_platform        2871
converted                 0
variant                   0
dtype: int64
In [23]:
In [24]:
Out[24]:
<AxesSubplot:>
In [25]:
Out[25]:
age                    0
joined_getsafe_at      0
language               0
channel                0
first_product          0
zip_bucket             0
area_classification    0
most_used_os           0
buying_platform        0
converted              0
variant                0
dtype: int64

4.1 | Summary Statistics of Numeric Columns

In [26]:
Out[26]:
count mean std min 25% 50% 75% max
age 2980.0 32.734899 10.023647 18.0 25.0 31.0 38.0 91.0
converted 2980.0 0.658389 0.474329 0.0 0.0 1.0 1.0 1.0

4.2 | Summary Statistics of Categorical Columns

In [27]:
Out[27]:
joined_getsafe_at language channel first_product zip_bucket area_classification most_used_os buying_platform variant
count 2980 2980 2980 2980 2980 2980 2980 2980 2980
unique 2788 2 10 8 592 5 2 2 2
top 2020-09-27 22:11:00 de direct liability 104xx rural Android web B
freq 3 2409 670 1604 42 1133 1780 1607 1491

5 | Exploratory Data Analysis

In [28]:
In [29]:

As the wrong maximum age had been deleted at cleaning data section, so currently The lower value of 18 and the maximum value of 91; as a result, the typical age range is between 18 and 91.

In [30]:
In [31]:
Out[31]:
<seaborn.axisgrid.FacetGrid at 0x7ff0049204f0>

This shows there are no observations outside the 18-91 range. Therefore, it is a proper action to cap the values. A column for the age bucket is added to enhance the focus on this significant variable.

In [32]:
In [33]:
Out[33]:
age joined_getsafe_at language channel first_product zip_bucket area_classification most_used_os buying_platform converted variant in_age
679 25 2020-02-26 19:38:00 de partner liability 868xx rural Android app 1 B in 20's
2954 37 2020-03-03 9:55:00 en direct liability 803xx Big 7 iOS web 0 A in 30's
1054 33 2020-07-24 12:40:00 en referral contents 107xx Big 7 Android web 1 B in 30's
1324 39 2020-04-14 19:22:00 de partner liability 728xx rural Android web 0 A in 30's
1444 49 2020-01-17 9:18:00 de referral dental 793xx rural Android app 1 B in 40's
In [34]:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2980 entries, 0 to 2979
Data columns (total 12 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   age                  2980 non-null   int64 
 1   joined_getsafe_at    2980 non-null   object
 2   language             2980 non-null   object
 3   channel              2980 non-null   object
 4   first_product        2980 non-null   object
 5   zip_bucket           2980 non-null   object
 6   area_classification  2980 non-null   object
 7   most_used_os         2980 non-null   object
 8   buying_platform      2980 non-null   object
 9   converted            2980 non-null   int64 
 10  variant              2980 non-null   object
 11  in_age               2980 non-null   object
dtypes: int64(2), object(10)
memory usage: 279.5+ KB
In [35]:
Out[35]:
Index(['age', 'joined_getsafe_at', 'language', 'channel', 'first_product',
       'zip_bucket', 'area_classification', 'most_used_os', 'buying_platform',
       'converted', 'variant', 'in_age'],
      dtype='object')
In [36]:
Out[36]:
age converted
count 2980.000000 2980.000000
mean 32.734899 0.658389
std 10.023647 0.474329
min 18.000000 0.000000
25% 25.000000 0.000000
50% 31.000000 1.000000
75% 38.000000 1.000000
max 91.000000 1.000000
In [37]:
Out[37]:
array(['legal', 'liability', 'contents', 'travel', 'dental', 'bike',
       'dog', 'accident'], dtype=object)
In [38]:
Out[38]:
liability    1604
contents      594
travel        293
legal         241
bike          123
dog            64
dental         56
accident        5
Name: first_product, dtype: int64
In [39]:
liability160453.8%contents59419.9%travel2939.83%legal2418.09%bike1234.13%dog642.15%dental561.88%accident50.168%
liabilitycontentstravellegalbikedogdentalaccidentProduct DistributionProduct

The Liability Insurance prodict records the highest sales by 53.8% , then Content insurance product by 19.9%.

In [40]:
Out[40]:
first_product Minimum_age Maximum_Age Average_Age
0 accident 21 48 33.400000
1 bike 19 66 32.674797
2 contents 18 79 33.611111
3 dental 20 58 32.267857
4 dog 19 58 40.000000
5 legal 19 67 35.402490
6 liability 18 91 32.041147
7 travel 18 73 31.078498
In [41]:
Out[41]:
18
In [42]:
Out[42]:
91
In [43]:
Out[43]:
in_age Frequency
0 in 10's 91
1 in 20's 1249
2 in 30's 1007
3 in 40's 402
4 in 50's 184
5 in 60's 34
6 in 70's 12
7 in 90's 1
In [44]:
in 20'sin 30'sin 40'sin 50'sin 10'sin 60'sin 70'sin 90's124910074021849134121
Age Distribution

GetSafe's potential cusotmer are mainly in 20s and 30s .

In [45]:
Out[45]:
first_product in_age Frequency
0 accident in 20's 2
1 accident in 30's 1
2 accident in 40's 2
3 bike in 10's 1
4 bike in 20's 49
5 bike in 30's 49
6 bike in 40's 17
7 bike in 50's 4
8 bike in 60's 3
9 contents in 10's 20
10 contents in 20's 217
11 contents in 30's 212
12 contents in 40's 97
13 contents in 50's 38
14 contents in 60's 8
15 contents in 70's 2
16 dental in 20's 22
17 dental in 30's 26
18 dental in 40's 6
19 dental in 50's 2
20 dog in 10's 2
21 dog in 20's 9
22 dog in 30's 19
23 dog in 40's 21
24 dog in 50's 13
25 legal in 10's 2
26 legal in 20's 68
27 legal in 30's 93
28 legal in 40's 54
29 legal in 50's 22
30 legal in 60's 2
31 liability in 10's 55
32 liability in 20's 741
33 liability in 30's 508
34 liability in 40's 179
35 liability in 50's 93
36 liability in 60's 19
37 liability in 70's 8
38 liability in 90's 1
39 travel in 10's 11
40 travel in 20's 141
41 travel in 30's 99
42 travel in 40's 26
43 travel in 50's 12
44 travel in 60's 2
45 travel in 70's 2
In [46]:
liabilitycontentstravellegalbikedogdentalaccidentin 20'sin 30'sin 40'sin 50'sin 10'sin 60'sin 70'sin 90'sin 20'sin 30'sin 40'sin 50'sin 10'sin 60'sin 70'sin 20'sin 30'sin 40'sin 50'sin 10'sin 60'sin 70'sin 30'sin 20'sin 40'sin 50'sin 10'sin 60'sin 20'sin 30'sin 40'sin 50'sin 60'sin 10'sin 40'sin 30'sin 50'sin 20'sin 10'sin 30'sin 20'sin 40'sin 50'sin 20'sin 40'sin 30's74150817993551981217212973820821419926121122936854222249491743121191392262262221
Product Age Frequency

The number of insured persons under a personal liability policy is the highest; especially, the young insured who age in their 20s. According to a survey conducted by Evgenia Koptyug (2022), the statistics show that the results of the survey conducted in Germany on living situations among the population in 2021, broken down by age group that 77.5% of 20-to-29-year olds are rentals. It is obvious that GetSafe smartly succeeded to target this age group through its liability product.

However, 46.2% of 14 to 19 years old are rentals in Germany according to the same survey, and this percentage is quite high; therefore, GetSafe has a great opportunity to target this age group and increase its sales of personal liability product, and this can be done by a marketing campaign and adding new features meeting the customer's need who are between 14 to 19 years old; as a result, that will be positively affect the GetSafe's growth.

In [47]:
Out[47]:
array(['rural', 'urban', 'Big 7', 'large town', 'town'], dtype=object)
In [48]:
Out[48]:
rural         1133
Big 7          846
large town     487
town           295
urban          219
Name: area_classification, dtype: int64
In [49]:
rural113338%Big 784628.4%large town48716.3%town2959.9%urban2197.35%
ruralBig 7large towntownurbanRegion ClassificationRegion

As it is shown, the rural area records the highest number of the policies for GetSafe ,after it comes the Big 7 areas.

In [50]:
ruralBig 7large towntownurban20040060080010001200
4006008001000Area classificationRegionsFrequency Of Area
In [51]:
Out[51]:
area_classification first_product Product_Count
0 Big 7 accident 1
1 Big 7 bike 44
2 Big 7 contents 193
3 Big 7 dental 23
4 Big 7 dog 10
5 Big 7 legal 68
6 Big 7 liability 436
7 Big 7 travel 71
8 large town bike 27
9 large town contents 96
10 large town dental 13
11 large town dog 13
12 large town legal 29
13 large town liability 259
14 large town travel 50
15 rural accident 1
16 rural bike 31
17 rural contents 199
18 rural dental 12
19 rural dog 31
20 rural legal 112
21 rural liability 629
22 rural travel 118
23 town accident 1
24 town bike 11
25 town contents 60
26 town dental 7
27 town dog 9
28 town legal 14
29 town liability 164
30 town travel 29
31 urban accident 2
32 urban bike 10
33 urban contents 46
34 urban dental 1
35 urban dog 1
36 urban legal 18
37 urban liability 116
38 urban travel 25
In [52]:
ruralBig 7large towntownurbanliabilitycontentstravellegalbikedogdentalaccidentliabilitycontentstravellegalbikedentaldogaccidentliabilitycontentstravellegalbikedentaldogliabilitycontentstravellegalbikedogdentalaccidentliabilitycontentstravellegalbikeaccidentdentaldog6291991181123131121436193716844231012599650292713131646029141197111646251810211
Product Region and Count.
In [53]:
Out[53]:
first_product in_age area_classification frequency
0 accident in 20's Big 7 1
1 accident in 20's urban 1
2 accident in 30's town 1
3 accident in 40's rural 1
4 accident in 40's urban 1
... ... ... ... ...
162 travel in 50's urban 1
163 travel in 60's rural 1
164 travel in 60's town 1
165 travel in 70's large town 1
166 travel in 70's rural 1

167 rows × 4 columns

In [54]:
liabilitycontentstravellegalbikedogdentalaccidentin 20'sin 30'sin 40'sin 50'sin 10'sin 60'sin 70'sin 90'sin 20'sin 30'sin 40'sin 50'sin 10'sin 60'sin 70'sin 20'sin 30'sin 40'sin 50'sin 10'sin 60'sin 70'sin 30'sin 20'sin 40'sin 50'sin 10'sin 60'sin 20'sin 30'sin 40'sin 50'sin 60'sin 10'sin 40'sin 30'sin 50'sin 20'sin 10'sin 30'sin 20'sin 40'sin 50'sin 20'sin 40'sin 30'sruralBig 7large towntownurbanruralBig 7large towntownurbanrurallarge townBig 7townurbanruralBig 7large towntownurbanruralBig 7townurbanlarge townruralBig 7large towntownurbanruraltownBig 7Big 7rurallarge towntownurbanruralBig 7large townurbantownruraltownBig 7large townurbanBig 7rurallarge townurbantowntownurbanBig 7large townruralBig 7ruralBig 7large townruralBig 7large towntownurbanruralBig 7large townurbantownrurallarge townBig 7townurbanruralBig 7large townurbanruralBig 7large towntownruraltownlarge townruralruralBig 7large towntownurbanruralBig 7large townurbantownruralBig 7large towntownurbanrurallarge townBig 7urbanBig 7ruralruraltownruralBig 7large towntownurbanBig 7large townruralurbantownlarge townBig 7ruraltownurbantownBig 7rurallarge townBig 7Big 7rurallarge townBig 7townruralBig 7townlarge townrurallarge towntownurbanBig 7ruraltownlarge townBig 7large townruraltownurbanBig 7large towntownruralruralBig 7Big 7townBig 7urbanruralurbantown26422112873551901577450377639331714571211112271076511222244174614817178578241312381918157151055394331441154322716123530141191463216411721111114032876262410623287431243311111813855231094375311211211873311332921143221266118752421111111
Product age and region.
In [55]:
050100150200250in 20'sin 10'sin 30'sin 40'sin 50'sin 60'sin 70'sin 90's
Big 7urbantownrurallarge townThe frequency of the issued policies increases with rural and big 7 areas among 20's and 30'sPolicies frequenciesAge Group

For each region, the sunburst shows the distribution of the insurance products in the regions. Liability and content insurance policy reached the highest sales in rural and big 7 areas.

The main difference between personal liability and content insurance is that a content insurance policy covers your belongings. If it doesn’t include tenants’ liability, it won’t cover any damage to your landlord’s belongings.


If the customer damages his/her belongings, it won’t affect the deposit return on the customer's rental. But the damage to the landlord’s belongings could cause problems when the customer wants his deposit back at the end of the tenancy period. This is where tenants’ liability insurance could help.

As shown above the interrelation between personal liability and content policy is quite strong, and that can create growth opportunities through cross-selling and making marketing campaigns highlighting the coverage of each policy and why the tenant needs both policies to be protected. Moreover, the GetSafe website and mobile platform had already been developed to embed the suggestion of the personal liability policies in its two types which are comfort and premium plans when the customer tries to buy a liability policy. As a result, the visitor of the website gets exposure to the two plans simultaneously.

In addition to the region and product type, the young ages are positively correlated with the type of the products. Among the multiple types of products, the frequency of issued policies tends to increase with certain regions.

Christian Wiens (CEO) was interviewed in April 2021 when he talked about the company's expansion plan in the UK, he expressed the company's aim which is "the insurance age" to replicate the success that it had in Germany, and the above analysis showing that the company's KPI is achieved.

However, the analysis shows that the highest sales are reached by only two insurance policies personal liability and content policy. In this case, GetSafe company needs to develop other insurance products such as dental, travel, dog, and accident.

By selecting dental insurance policy as an example for growth outside (liability and content product), we need to look at the other insurtech companies which are presented with the same type of product (dental Insurance) but with different prices and features. The below table comparing between GetSafe and the other competitors which are Ottonova and Feather. First, we need to understand the potential of the others, and by concentrating on the price and features of the plans. Obviously, Ottonova presents better prices and multiple features of its plan such as Economy, business, or first class. So GetSafe needs to reconsider either its price or the features of its plan. Assuming, GetSafe decided to reconsider its price, It is recommended that the underwriting team needs to discuss this issue with the actuary and the reinsurance company which is Munich Re for GetSafe. On the other hand, if the GetSafe decided to develop its features of the plan, It is recommended that the company replicate the same product development strategy which was applied in GetSafe personal liability policy which has multiple features of plans that are "Comfort plan" and "Premium plan", and this variation in the features of the plan has not been applied on GetSafe dental policy though it approved its success with personal liability policy as it is shown in the above analysis. That is exactly what makes Ottonova's growth faster than GetSafe. In conclusion, It is recommended the product development team has to create new variations for the features of the dental plan.

Screen%20Shot%202022-11-28%20at%2011.48.56%20AM.png

In [56]:
Out[56]:
array(['web', 'app'], dtype=object)
In [57]:
Out[57]:
web    1607
app    1373
Name: buying_platform, dtype: int64
In [58]:
web160753.9%app137346.1%
webappPlatform DistributionPlatform

It is interesting to see that the web platform are performing slightly better than the mobile Application.

In [59]:
Out[59]:
in_age buying_platform platform_count
0 in 10's app 47
1 in 10's web 44
2 in 20's app 654
3 in 20's web 595
4 in 30's app 459
5 in 30's web 548
6 in 40's app 141
7 in 40's web 261
8 in 50's app 59
9 in 50's web 125
10 in 60's app 10
11 in 60's web 24
12 in 70's app 3
13 in 70's web 9
14 in 90's web 1
In [60]:
in 20'sin 30'sin 40'sin 50'sin 10'sin 60'sin 70'sin 90'sappwebwebappwebappwebappappwebwebappwebappweb6545955484592611411255947442410931
Platform Vs. Age

Precisely , the cusotmers in 20s are equaly using the webiste and the mobile application though it was expected the mobile application will be higher than the website . This can be to a reason that the Getsafe website was smoothly designed to meet the expectations of the 20s generation.

In [61]:
Out[61]:
age joined_getsafe_at language channel first_product zip_bucket area_classification most_used_os buying_platform converted variant in_age
2874 19 2020-04-10 5:09:00 de affiliate_deal travel 032xx rural Android web 0 A in 10's
388 27 2020-02-22 21:30:00 de aggregator liability 670xx large town iOS app 1 B in 20's
558 38 2020-02-29 14:11:00 de affiliate_deal bike 310xx rural Android web 1 A in 30's
1180 51 2020-08-11 11:08:00 de affiliate liability 249xx town iOS web 0 A in 50's
2119 35 2020-10-13 19:15:00 de aggregator contents 825xx rural Android app 1 A in 30's
In [62]:
Out[62]:
channel first_product chanel_frequency
0 affiliate bike 3
1 affiliate contents 32
2 affiliate dental 4
3 affiliate dog 2
4 affiliate legal 13
5 affiliate liability 274
6 affiliate travel 7
7 affiliate_deal bike 70
8 affiliate_deal contents 30
9 affiliate_deal dental 1
10 affiliate_deal legal 20
11 affiliate_deal liability 217
12 affiliate_deal travel 221
13 aggregator bike 1
14 aggregator contents 131
15 aggregator dog 3
16 aggregator legal 49
17 aggregator liability 393
18 direct accident 5
19 direct bike 21
20 direct contents 155
21 direct dental 30
22 direct dog 13
23 direct legal 57
24 direct liability 359
25 direct travel 30
26 offline contents 1
27 offline liability 1
28 paid_search contents 54
29 paid_search dental 5
30 paid_search dog 11
31 paid_search legal 52
32 paid_search liability 67
33 paid_search travel 2
34 paid_social contents 110
35 paid_social dental 1
36 paid_social dog 29
37 paid_social legal 36
38 paid_social liability 116
39 paid_social travel 1
40 partner bike 8
41 partner contents 9
42 partner liability 39
43 partner travel 21
44 podcast contents 1
45 podcast dental 1
46 podcast liability 5
47 referral bike 20
48 referral contents 71
49 referral dental 14
50 referral dog 6
51 referral legal 14
52 referral liability 133
53 referral travel 11
In [63]:
directaggregatoraffiliate_dealaffiliatepaid_socialreferralpaid_searchpartnerpodcastofflineliabilitycontentslegaldentaltravelbikedogaccidentliabilitycontentslegaldogbiketravelliabilitybikecontentslegaldentalliabilitycontentslegaltraveldentalbikedogliabilitycontentslegaldogdentaltravelliabilitycontentsbikedentallegaltraveldogliabilitycontentslegaldogdentaltravelliabilitytravelcontentsbikeliabilitycontentsdentalcontentsliability35915557303021135393131493122121770302012743213743211611036291113371201414116675452115239219851111
Chanel Vs Product

The significance of the sales channel for GetSafe has been highlighted by the CEO when he emphasised the impact of the Covid-19 crisis on a new business; hence, he said "the customers reconsidering whether or not to make personal appointments with a broker. They are switching to digital insurers and discovering the benefits. Many of them will not switch back after the crisis is over. That is the greatest danger for insurers – and the greatest opportunity for insurgents".

Based on the analysis, GetSafe succeeded to make its biggest sales channel for its policy the aggregators, and half of Getsafe sales is via this digital channel.For GetSafe, aggregators have become, in effect, the customer-facing side of the business. This change has led to a strong networking effect: rising use leads more product providers to employ aggregators as a sales channel while increasing market coverage attracts more users. Increased coverage also leads to better conversion rates, effectively driving down acquisition costs for the aggregator.

The medium to the long-term growth story of the B2C E-commerce industry in Germany promises to be attractive. The B2C E-commerce is expected to grow steadily over the forecast period, recording a CAGR of 13.62% during 2022-2026. The country's B2C e-commerce Gross Merchandise Value will increase from 46,933 million USD to 93,843.4 million by 2026. Reference to McKinsey's report states that Germany has 60% of its retail banking insurance products depend on aggregators.

As a result, I recommend that the growth team and Data team has to name the types of aggregators that deal with like price-comparison, lead-generation only, broker, and product provider; hence, this will not only improve the result of the data analysis but also will create a mapping plan to the growth team to start to write a new strategy of selling the other products via aggregators because we have seen that the majority of the sales via aggregator is the liability product.

In [64]:
Out[64]:
age joined_getsafe_at language channel first_product zip_bucket area_classification most_used_os buying_platform converted variant in_age
1217 61 2020-06-19 14:51:00 de affiliate liability 754xx rural Android web 0 A in 60's
816 22 2020-05-18 22:00:00 de affiliate_deal contents 485xx rural Android web 1 B in 20's
1993 25 2020-02-03 14:50:00 de aggregator liability 806xx Big 7 iOS app 0 B in 20's
2767 34 2020-10-15 11:23:00 de aggregator liability 108xx Big 7 Android app 1 B in 30's
In [65]:
Out[65]:
buying_platform most_used_os Minimum_age Maximum_age Average_age
0 app Android 18 73 32.129333
1 app iOS 18 73 29.940610
2 web Android 18 91 34.865049
3 web iOS 18 79 32.736568
In [66]:
Out[66]:
first_product age joined_getsafe_at language channel zip_bucket area_classification most_used_os buying_platform converted variant in_age
0 accident 48 2020-08-02 17:55:00 de direct 306xx urban Android app 1 A in 40's
1 bike 66 2020-05-04 18:46:00 de direct 808xx Big 7 iOS app 1 B in 60's
2 contents 79 2020-08-19 13:28:00 de paid_search 683xx large town Android web 0 A in 70's
3 dental 58 2020-05-24 14:05:00 de direct 676xx town Android app 0 B in 50's
4 dog 58 2020-05-17 17:58:00 de referral 723xx rural iOS web 0 B in 50's
5 legal 67 2020-02-24 15:23:00 de aggregator 031xx rural iOS app 0 A in 60's
6 liability 91 2020-06-01 12:37:00 de affiliate 508xx Big 7 Android web 0 B in 90's
7 travel 73 2020-03-05 12:27:00 de referral 413xx rural iOS app 1 A in 70's

5 | A/B testing for GetSafe marketing Campaign

Table of contents :


Part I — Data Overview
Part II — Feature Engineering
Part III — Data Overveiw
Part IV — Designing Our Experiment
Part V — Choosing the variables
Part VI — Sampling
Part VII — Visualising the result
Part VIII — Testing the hypothesis
Part IX — Drawing conclusions

Part I | Data Overview

In [96]:
In [97]:
Out[97]:
age joined_getsafe_at language channel first_product zip_bucket area_classification most_used_os buying_platform converted variant
1246 24 2020-04-01 18:17:00 de paid_social liability 972xx rural iOS web 1 A
5169 30 2020-02-12 14:50:00 en aggregator liability 691xx large town Android app 1 B
1570 28 2020-10-27 9:40:00 de aggregator liability 415xx town Android NaN 1 B
4564 42 2020-07-26 17:25:00 de aggregator liability 463xx rural Android NaN 0 A

Part II | Feature Engineering

In [98]:
Out[98]:
age joined_getsafe_at language channel first_product zip_bucket area_classification most_used_os buying_platform converted variant user_id
1656 34 2020-01-13 15:05:00 de aggregator liability 383xx town iOS NaN 0 A 1656
818 26 2020-08-17 8:52:00 de aggregator contents 123xx Big 7 Android app 0 A 818
4052 46 2020-10-29 15:17:00 de aggregator dog 405xx Big 7 Android NaN 1 A 4052
4232 19 2020-05-27 1:13:00 de aggregator liability 651xx large town Android app 1 B 4232
In [99]:
Out[99]:
count    6000.000000
mean     2999.500000
std      1732.195139
min         0.000000
25%      1499.750000
50%      2999.500000
75%      4499.250000
max      5999.000000
Name: user_id, dtype: float64

Part III | Data Overview

In [100]:
Number of unique users in experiment: 6000
Data collected for 303 days
Matrix to compare: ['B', 'A']
Percentage of users in Control: 50%

Since all the users are unique , so no need to get the timestamp of the first conversion;therofore, the following steps will not be applied:
1- Get timestamp of first exposure
2- Remove users with multiple buckets

In [101]:
In [102]:
Out[102]:
False    6000
Name: user_id, dtype: int64

0% user_id has been exposed to the both control and treatment,so all the users are retained for the experiment.

Part IV | Designing our experiment

The GetSafe's marketing team built proxy matric for new feature of app to acquire more younge age , and the product development team worked on a new version of app feature, with the hope that it will lead to a higher conversion rate. By assumaing that the product manager (PM) told us that the current conversion rate is about 7% on average throughout the year, and that the team would be happy with an increase of 3%, meaning that the new design will be considered a success if it raises the conversion rate to 10%.

Part V | Choosing variables

For our test we’ll need two groups under Variant column:

  • A control group - They'll be shown the old design
  • B treatment (or experimental) group - They'll be shown the new design

Part VI | Sampling

In [156]:
1347
In [157]:
Out[157]:
age joined_getsafe_at language channel first_product zip_bucket area_classification most_used_os buying_platform converted variant user_id
0 25 2020-08-22 15:42:00 de affiliate_deal travel 091xx large town Android web 1 A 496
1 30 2020-08-22 13:26:00 de direct liability 134xx Big 7 Android app 1 A 1236
2 35 2020-02-24 17:18:00 de paid_social contents 525xx rural Android app 1 A 2829
3 44 2020-04-05 16:43:00 de paid_social contents 295xx rural Android web 0 A 5237
4 32 2020-05-29 12:34:00 en affiliate liability 133xx Big 7 NaN web 0 A 2490
... ... ... ... ... ... ... ... ... ... ... ... ...
2689 49 2020-10-02 14:19:00 de aggregator dog 508xx Big 7 iOS NaN 0 B 1537
2690 29 2020-08-27 15:50:00 de aggregator liability 109xx Big 7 iOS NaN 0 B 1907
2691 50 2020-08-20 11:33:00 de aggregator contents 713xx town Android NaN 0 B 3795
2692 19 2020-09-11 20:38:00 de aggregator contents 147xx town iOS app 1 B 5243
2693 32 2020-05-14 16:56:00 de affiliate liability 735xx rural iOS NaN 0 B 5562

2694 rows × 12 columns

In [158]:
Out[158]:
A    1347
B    1347
Name: variant, dtype: int64

Part VII | Visualising Results

In [159]:
Out[159]:
  conversion_rate std_deviation std_error
variant      
A 0.456 0.498 0.014
B 0.506 0.500 0.014

Judging by the stats above, it does look like our two designs performed very similarly, with our new design performing slightly better, approx. 50.6% vs. 45.6% conversion rate.

The conversion rates for our groups are indeed very close. Also note that the conversion rate of the control group is lower than what we would have expected given what we knew about our avg. conversion rate (51% vs. 46%). This goes to show that there is some variation in results when sampling from a population.

Part VIII | Testing the hypothesis

The last step of our analysis is testing our hypothesis. Since the sampel is a very large , so the normal approximation for calculating our p-value is used .

In [155]:
z statistic: -2.62
p-value: 0.009
ci 95% for control group: [0.429, 0.482]
ci 95% for treatment group: [0.480, 0.533]

Part IX | Drawing conclusions

Since our p-value=0.90% is below our α=0.05 threshold, we can reject the Null hypothesis Hₒ, which means that our new design did perform significantly differently (let alone better) than our old app feature.

Additionally, if we look at the confidence interval for the treatment group ([0.48, 0.533], or 0.48-53.3%) we notice that:

  • It includes our baseline value of a 7% conversion rate
  • It is above our target value of 10% (the 3% uplift we were aiming for)

What this means is that it is more likely that the true conversion rate of the new app feature is better than our baseline, and rather than the 10% target we had hoped for. This is further proof that our new design targeting "young customers" is likely to be an improvement on our old design of the marketing campaign.
In conclusion, customers who are in their twenties and thirties of age are more likely to convert.